Combining SVM Classifiers for Email Anti-spam Filtering
نویسندگان
چکیده
Spam, also known as Unsolicited Commercial Email (UCE) is becoming a nightmare for Internet users and providers. Machine learning techniques such as the Support Vector Machines (SVM) have achieved a high accuracy filtering the spam messages. However, a certain amount of legitimate emails are often classified as spam (false positive errors) although this kind of errors are prohibitively expensive. In this paper we address the problem of reducing particularly the false positive errors in anti-spam email filters based on the SVM. To this aim, an ensemble of SVMs that combines multiple dissimilarities is proposed. The experimental results suggest that the new method outperforms classifiers based solely on a single dissimilarity and a widely used combination strategy such as bagging.
منابع مشابه
Ensemble of SVM Classifiers for Spam Filtering
Unsolicited commercial email also known as Spam is becoming a serious problem for Internet users and providers (Fawcett, 2003). Several researchers have applied machine learning techniques in order to improve the detection of spam messages. Naive Bayes models are the most popular (Androutsopoulos, 2000) but other authors have applied Support Vector Machines (SVM) (Drucker, 1999), boosting and d...
متن کاملStacking Classifiers for Anti-Spam Filtering of E-Mail
We evaluate empirically a scheme for combining classifiers, known as stacked generalization, in the context of anti-spam filtering, a novel cost-sensitive application of text categorization. Unsolicited commercial email, or “spam”, floods mailboxes, causing frustration, wasting bandwidth, and exposing minors to unsuitable content. Using a public corpus, we show that stacking can improve the eff...
متن کاملSearching for Interacting Features for Spam Filtering
In this paper, we propose a novel feature selection method— INTERACT to select relevant words of emails for spam email filtering, i.e. classifying an email as spam or legitimate. Four traditional feature selection methods in text categorization domain, Information Gain, Gain Ratio, Chi Squared, and ReliefF, are also used for performance comparison. Three classifiers, Support Vector Machine (SVM...
متن کاملDetecting Image Spam Using Image Texture Features
Filtering image email spam is considered to be a challenging problem because spammers keep modifying the images being used in their campaigns by employing different obfuscation techniques. Therefore, preventing text recognition using Optical Character Recognition (OCR) tools and imposing additional challenges in filtering such type of spam. In this paper, we propose an image spam filtering tech...
متن کاملE-mail Spam Filtering with Local Svm Classifiers
This paper describes an e-mail spam filter based on local SVM, namely on the SVM classifier trained only on a neighborhood of the message to be classified, and not on the whole training data available. Two problems are stated and solved. First, the selection of the right size of neighborhood is shown to be critical; our solution is based on the estimation of the a-posteriori probability of the ...
متن کامل